On the distribution of career longevity and the evolution of home run prowess in 

professional baseball 
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Statistical analysis is a major aspect of baseball, from player averages to historical benchmarks 
and records. Much of baseball fanfare is based around players exceeding the norm, some in a single 
game and others over a long career. Career statistics serve as a metric for classifying players and 
establishing their historical legacy. However, the concept of records and benchmarks assumes that 
the level of competition in baseball is stationary in time. Here we show that power-law probability 
density functions, a hallmark of many complex systems that are driven by competition, govern 
career longevity in baseball. We also find similar power laws in the density functions of all major 
performance metrics for pitchers and batters. The use of performance-enhancing drugs has a dark 
history, emerging as a problem for both amateur and professional sports. We find statistical evidence 
consistent with performance-enhancing drugs in the analysis of home runs hit by players in the last 
25 years. This is corroborated by the findings of the Mitchell Report [l[, a two-year investigation 
into the use of illegal steroids in major league baseball, which recently revealed that over 5 percent 
of major league baseball players tested positive for performance-enhancing drugs in an anonymous 
2003 survey. 

PACS numbers: 01.80.-|-b, 89.75.Da, 02.50.Fz 



Baseball is a game of legends, mystique, euphoria and 
heartbreak. It is also a game of numbers and records. 
Here we analyze approximately 10,000 players who ended 
their careers between the years 1920 and 2000, where 
1920 is the year widely considered as the beginning of the 
modern era of baseball. We utilize Sean Lahman's Base- 
ball Archive Q] , an exhaustive database consisting of Ma- 
jor League Baseball player statistics dating back to 1871. 
This database was meticulously constructed, going so far 
as to extract data from old newspaper reels. We find that 
baseball players have universal properties despite the dis- 
tinct eras in which they played. Specifically, we find that 
the probability density functions of career totals obey 
scale-free power laws over a large range for all metrics 
studied. As usual, the probability density function P{x) 
is defined such that the probability of observing an event 
in the interval x-\- 5x is P{x)Sx. Power law density func- 
tions, P{x) ^ x~°' , arise in many cornplex systems where 
competition drives the dynamics 0, |j, [^, [g, 0, S 3 • ^ 
key feature of the scale-free power law is the disparity 
between the mostprobable value and the mean value of 
the distribution [l0|. For a Gaussian distribution, these 
two values coincide. However, with a power law, the 
most probable value Xmp = 1, while the mean value {x) 
diverges for a < 2. Thus, in power law distributed phe- 
nomena, there are rare extreme events that are orders of 
magnitude greater than the most common events. This 
leads naturally to the notion of record events and the 
statistical analysis of sample extremes [ll|. We begin 
this letter with an analysis of career longevity in Amer- 
ican baseball. Because the legacy of a player is based 
mainly upon his career totals, we also discuss the impli- 
cations of the power-law behavior found in common ca- 
reer metrics. We conclude with empirical evidence, found 




FIG. 1: Probability density function of player longevity. 
Longevity is defined as the number of outs-pitched (pitchers), 
and the number of at-bats (all batters) for players ending their 
career in the years 1920-2000. Power law extends over more 
than three orders of magnitude, with q ~ 1. For reference: 
straight line represents the power-law P{x) ~ x~^ . 



in home run statistics, which is consistent with modern 
performance-enhancing factors including widespread use 
of performance-enhancing drugs. 

In Fig. [1] we present the longevity of a player's career 
measured in at-bats (AB) and innings-pitched measured 
in outs (IPO). For these two metrics, we find truncated 
power-law distributions that range over three decades, 
marked by a sharp exponential cutoff at a value corre- 
sponding to around twenty seasons. It should be noted 
that unlike a complete power-law distribution with a sa 1, 
which has a divergent first and second moment, a trun- 
cated distribution has a definite mean and second mo- 




FIG. 2: Probability density function of career statistics in 
four categories, (a) Hits, (b) Runs Batted In, (c) K (strike- 
outs), (d) Wins. Plotted for each statistical metric are the 
distribution of career totals for players whose career ended 
in the periods 1920-1960 and 1960-2000. The pairs of dis- 
tributions are all qualitatively similar, with the exponential 
cutoffs occurring at the same critical value, indicating that 
the competition level in baseball has been relatively constant 
with respect to these career metrics. 



ment. To our surprise, wc find that the distributions 
for career longevity have their maxima around 1 appear- 
ance. This implies that most players who make it to the 
major leagues do not remain for very long, possibly mak- 
ing their professional debut and exit in a single pinch-hit 
or relief appearance. This leads to a perplexing feature 
of scale-free power laws, namely that it is just as hard 
to reach your 10th appearance from your debut appear- 
ance as a rookie as it is to reach your 10,000th appear- 
ance from your 1000th appearance as a seasoned veteran. 
In other words, the ratio P{x2)/P{xi) = {x2/xi)~°' de- 
pends only on the scale- free ratio of X2lx\ and the uni- 
versal exponent a. This raises a fundamental question 
addressing longevity in American baseball: How is it 
possible that the same level of competition can elimi- 
nate some players after one appearance while sustaining 
others for more than two decades? American baseball 
has a 3-tier farm system, collectively known as the minor 
leagues. These developmental leagues filter talent up to 
the major leagues, with only the best players staying at 
the major league level. Occasionally there are opportuni- 
ties for minor league players to be promoted to the major 
leagues for short unguaranteed stints, either if their ma- 
jor league affiliate has a roster vacancy due to injury or 
if their major league affiliate is not in a position to make 
the post-season. The long regular season provides ample 
opportunity for these major league tryouts, thus account- 
ing for the high frequency of short careers. 

In Fig. [21 we plot the distribution of career batting 
and pitching totals for all players who ended their ca- 
reers between the years of 1920-1960 and 1960-2000 (we 
restrict our analysis to completed careers). Separating 
players into two subsets allows us to compare careers be- 



longing to each era, where 1961 marks the beginning of 
the first expansion era in major league baseball. We also 
find truncated power-law behavior with exponent a k, \ 
for all major career metrics. This should not be too sur- 
prising since each opportunity (defined in this paper as 
an at-bat or out-pitched) is capitalized upon at a player's 
personal rate (defined in this paper as his prowess); each 
success then contributes to the player's career statisti- 
cal tally. Thus, the exponent from the career longevity 
power-law should carry over naturally into the density 
functions of career metrics [I4I. In the case of batting 
statistics, we make no distinction between pitchers and 
other fielders who are on record for their at-bats. One 
can also do a statistical analysis on players who do not 
arise in the pitching database, but the distributions are 
not qualitatively different. Thus, career longevity mea- 
sured in at-bats indicates that there is a large disparity 
between the "iron-horses" and the "one-hit wonders" . It 
is perplexing that there is such a wide range of career 
lengths despite the typical prowess that distinguishes the 
upper echelon of baseball talent. It should also be noted 
that in the game of baseball there are two classes of pitch- 
ers, those that start games, and those that finish games. 
Pitchers of the first type have routine schedules, pitching 
once every four or five games in a maintained rotation. 
Pitchers of the second type pitch more frequently, with 
game sessions that are shorter, hardly ever exceeding 2 
innings (6 outs pitched). Despite these two classes of 
pitchers, the longevity measure of outs-pitched does not 
have any evidence of bimodal behavior. One can even 
notice the fluctuations in the beginning of the distribu- 
tion for outs-pitched with sharp peaks corresponding to 1 
inning (3 outs) and 2 inning (6 outs) stints. Comparing 
pitchers and batters, there is the remarkable similarity 
in power-law exponent corresponding to longevity, fol- 
lowing from the fact that it is very difficult to reach, 
and to remain, at the major league level. Moreover, the 
distributions are nearly equivalent, with the exponential 
cutoff occurring at approximately the same value. This 
justifies both the 3000-hit and the 3000-strikeout bench- 
marks for both batters and pitchers, and suggest that 
career longevity results from a universal mechanism that 
is invariant with respect to player type. Baseball relies 
on precision play, requiring quick physical and mental re- 
flex. The flow of the game is characterized by periods of 
lull, interlaced with bursts of activity, with both offense 
and defense capitalizing on sprinting, diving, and sliding 
plays [l3|. In addition, throwing a baseball is very stren- 
uous on the arm. Thus, although not a contact sport in 
the sense of hockey, rugby, or American football, base- 
ball players are still prone to injury, some of which are 
career-ending. The perpetual hazard of career-ending re- 
placement or injury provides the main in gred ient for ex- 
plaining the observed power laws. In Ref.[12l| we propose 
a simple stochastic mechanism for career longevity in pro- 
fessional sports which reproduces both the power-law be- 
havior and the exponential cutoff. We follow the expla- 
nation of Reed et al. UM, which shows that stochastic 
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FIG. 3: Probability density function of seasonal home-run 
prowess (pitchers are excluded from this analysis). The ex- 
ponential distribution, representing players in the years 1920- 
1979, is skewed towards smaller rates, indicating that even at 
the major league level, the ability to consistently hit home 
runs is rare. Players from the last 25 years have increased 
their ability to hit home runs, possibly as a result of modern 
training regimens, performance-enhancing drugs, expansion- 
based dilution of talent, and other hypothetical factors. (In- 
set) Probability density function of seasonal prowess for sev- 
eral key metrics over the seasons 1920-2000. These are cen- 
tered distributions with a mean {x), and standard deviation 
o", that define the talent level in the major leagues. 
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FIG. 4: Average league prowess (P) calculated for 5-year pe- 
riods starting in 1900. The trendline for HR/AB has been 
multiplied by a factor of 4 for clarity. The difference in the 
trendlines for Hr per Hit for batter and pitcher are essen- 
tially constant over the years. Following from the form of our 
weighted average, this suggests that the changes in home run 
prowess are not due to fluctuations in the talent pool aris- 
ing from expansion dilution. If more home runs were being 
hit by veteran batters against poor pitching, then a spread 
would appear, assuming that poor pitchers don't stay at the 
major league level very long. Instead, we see that veteran hit- 
ters (with a large fraction of the total hits) are hitting more 
home runs off of veteran pitchers (with a large fraction of hits 
surrendered, and thus many innings pitched). 



processes with exponential growth produce power-laws 
when the process is subject to random stopping times. 

In Fig. [3] we analyze seasonal home run prowess, de- 
fined as the rate per at-bat in which a particular player 
hits a home run. Pitchers are excluded from this anal- 
ysis. We also restrict our analysis to players who ex- 
ceed TV appearances in a given season, and use A^ = 100 
to eliminate statistical fluctuations arising from short- 
lived success. The seasonal prowess distributions for 
some common batting and pitching metrics are relatively 
unskewed, defined by a characteristic standard deviation 
around a central mean (Fig. [31 inset). Thus, there is 
a typical success rate that defines not only the players, 
but also the relative level of competition between pitcher 
and batter at the major league level. In contrast, the 
seasonal prowess distributions for home runs are more 
exponentially distributed (Fig. [3|). These distributions 
are skewed towards small values, indicating that it is 
rare for players to have prowess that consistently pro- 
duces home runs. Wc also note that the distributions 
for home-run prowess over the past 26 years reveals a 
shift towards players with higher home-run ability. This 
increase in home-run prowess could result from modern 
natural weight-training programs with or without the use 
of performance-enhancing drugs. Other theories suggest 
that maple bats, a reduced strike-zone, and league expan- 
sion all contribute to the increased home run performance 
of modern players in the "Steroid Era" . A recent study 



by R. Tobin [I^ demonstrates that a reasonable increase 
in a player's muscle-mass, say a 10 percent increase, can 
produce a significant increase in home run production, 
ranging from 30-70 percent increase, depending on sys- 
temic parameters. Thus, our findings are consistent not 
only with the factual revelations of the Mitchell Report, 
but also with the aforementioned Monte-Carlo simula- 
tions. 

It has been known for some time that home run rates 
over the last two decades have been rising |ig|, accom- 
panied by home run records falling. In Fig. 2] we plot 
the average prowess of several metrics over 5 year win- 
dows from 1900-2005 in order to investigate the evolution 
of home run prowess. If in a single season player i has 
prowess Pi = Xi/yi , then we compute the weighted 
average over all players 
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The index i runs over all individual player seasons during 
the period T, and ^^ j/i is the total number of events y 
during the same period. The first era of increasing home- 
run prowess followed the 1920 revision of the rules (such 




FIG. 5: Statistical evidence in career home run distributions 
consistent with performance enhancement drugs. Probabil- 
ity density function of home runs hit over a player's career 
ending in two different time periods, before and after 1980 
(pitchers are excluded). More home runs are being collected 
in the extreme part of the distribution by individuals ending 
their careers in the last 25 years, 1980-2004, marked by the 
"steroids era". 



confined to the highly sensitive exponential tail, whereas 
the differences in the career statistics for home runs 
extends into the bulk of the distribution. The use of 
steroids was most recently documented in the Mitchell 
Report [l|, a two-year investigation into the use of 
performance-enhancing drugs in major league baseball. 
This paper reports a lower-limit to the extent of steroid 
use in major league baseball at 5 percent, the results 
of a set of anonymous 2003 blood test that confirmed 
the widespread use of performance-enhancing drugs 
among major league players. Other Mitchell Report 
assessments, based on personal accounts, suggest much 
higher percentages of steroids use in professional base- 
ball. Steroids and other performance-enhancing drugs 
can be used for two general reasons, to gain strength and 
to reduce recovery-time from both workouts and injury. 
One might expect that performance-enhancing drugs 
would raise the level of competition across the board, for 
pitchers as well as batters, since both increased strength 
and speedy recovery can contribute to high career 
tallies. However, in our analysis of career statistics, 
we sec evidence for a competitive advantage mainly in 
the case of home runs. This suggests that the level of 
competition between pitcher and batter is tipping in the 
favor of the batter. 



as the outlaw of the "spit-ball" ) which made the batter 
and pitcher more equally competitive. This was followed 
by the emergence of sluggers such as Babe Ruth, who 
popularized the herculean feat of hitting home runs [I^l ■ 
The first expansion era 1961-1969 saw 8 new teams, ac- 
companied by a decrease in average home-run prowess. It 
is important to note that expansion within a league has 
two main effects. On the player level, expansion dilutes 
the talent in pitching and batting. This allows excellent 
players to take advantage of their weaker foe, and has 
been proposed as a possible factor responsible for the in- 
creased home run rate during the 1990's [1^1 . On the 
team level, the authors of Ref. [1^ show that the level of 
team competition measured in team-versus-team upset 
probability increases during expansion eras. The second 
expansion era 1993-1998 saw 4 new teams, accompanied 
by an increase in average home-run prowess following ap- 
proximately 20 years of stagnancy. 

Because career statistics serve as key metrics for clas- 
sifying players and establishing their historical legacy, we 
separate the players in Fig. \5\ into two subsets, players 
ending their careers before and after 1980, in order 
to compare career home run totals. We find that the 
last 25 years account for many more players with large 
career home-run tallies. Interestingly, there is similar 
evidence in the strikeout tallies of pitchers (Fig. [2t), 
which suggests that modern sluggers may be "swinging 
for the fences" with reckless abandon. We should note 
that the difference in career statistics for strikeouts is 



Major league baseball is a unique sport, relying on 
team play while also maintaining a platform for indi- 
vidual play, namely pitcher versus batter. It also has 
a deep developmental minor league system that filters 
out the best talent, and serves as a emergency source 
for randomly depleted team rosters. This provides an 
explanation for the abundance of hitters and pitchers 
who experience both their debut and finale in the same 
game. In |12l | we analyze career longevity in Korean 
baseball, American basketball, and English football. We 
find the same power-law behavior with exponential cut- 
off governing career statistics in all these professional 
sports, and provide a stochastic mechanism to explain 
the distribution of career length in competitive environ- 
ments that are subject to random exit times. It should 
also be noted that performance-enhancing drugs are not 
a problem unique to American baseball. A separate 
study of English football also revealed widespread use 
of performance-enhancing drugs [201 ■ Moreover, it is not 
just a problem pertainin g to professionals, but amateurs 
and adolescents as well [2l|, as performance-enhancing 
drugs are the core of a pandemic that not only poses per- 
sonal health risk, but also places the integrity of sports 
in jeopardy [22l. |23||. And finally, crossing over into the 
academic world, a recent study in the journal Nature 24 
reveals that cognitive-enhancing drugs are prevalent in 
the sciences, and pose the same ethical questions that 
apply to accomplishments in sports [25| . 
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